Goto

Collaborating Authors

 network topology


BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training

Songtao Wang, Dan Li, Yang Cheng, Jinkun Geng, Yanshu Wang, Shuai Wang, Shu-Tao Xia, Jianping Wu

Neural Information Processing Systems

In distributed machine learning (DML), the network performance between machines significantly impacts the speed of iterative training. In this paper we propose BML, a new gradient synchronization algorithm with higher network performance and lower network cost than the current practice. BML runs on BCube network, instead of using the traditional Fat-Tree topology.




A Experiment Details

Neural Information Processing Systems

Source code for the training pipeline, tasks, and models used in this work, is available as part of the supplementary material. We used the same Adam [48] optimizer for all our experiments and a learning rate of 0.001, and a batch size of 128. For solving the differential equations both during ground truth data generation as well as with the neural ODEs, we use the Tsitouras 5/4 Runge-Kutta (Tsit5) method from DifferentialEquations.jl [36]. A.1 Coupled Pendulum The coupled pendulum dynamics are defined as We train the MP-NODE on a dataset of 500 trajectories, each randomly initialized with state values between [ π/2, π/2] for the θ and [ 1, 1] for θ, with a time step of 0.1s and each trajectory 10s long. The dataset is normalized through Z-score normalization.



RELiQ: Scalable Entanglement Routing via Reinforcement Learning in Quantum Networks

Meuser, Tobias, Weil, Jannis, Lahiri, Aninda, Paraschiv, Marius

arXiv.org Artificial Intelligence

Quantum networks are becoming increasingly important because of advancements in quantum computing and quantum sensing, such as recent developments in distributed quantum computing and federated quantum machine learning. Routing entanglement in quantum networks poses several fundamental as well as technical challenges, including the high dynamicity of quantum network links and the probabilistic nature of quantum operations. Consequently, designing hand-crafted heuristics is difficult and often leads to suboptimal performance, especially if global network topology information is unavailable. In this paper, we propose RELiQ, a reinforcement learning-based approach to entanglement routing that only relies on local information and iterative message exchange. Utilizing a graph neural network, RELiQ learns graph representations and avoids overfitting to specific network topologies - a prevalent issue for learning-based approaches. Our approach, trained on random graphs, consistently outperforms existing local information heuristics and learning-based approaches when applied to random and real-world topologies. When compared to global information heuristics, our method achieves similar or superior performance because of its rapid response to topology changes.


Optimizing Quantum Key Distribution Network Performance using Graph Neural Networks

Anchan, Akshit Pramod, Acharya, Ameiy, Thungon, Leki Chom

arXiv.org Artificial Intelligence

This paper proposes an optimization of Quantum Key Distribution (QKD) Networks using Graph Neural Networks (GNN) framework. Today, the development of quantum computers threatens the security systems of classical cryptography. Moreover, as QKD networks are designed for protecting secret communication, they suffer from multiple operational difficulties: adaptive to dynamic conditions, optimization for multiple parameters and effective resource utilization. In order to overcome these obstacles, we propose a GNN-based framework which can model QKD networks as dynamic graphs and extracts exploitable characteristics from these networks' structure. The graph contains not only topological information but also specific characteristics associated with quantum communication (the number of edges between nodes, etc). Experimental results demonstrate that the GNN-optimized QKD network achieves a substantial increase in total key rate (from 27.1 Kbits/s to 470 Kbits/s), a reduced average QBER (from 6.6% to 6.0%), and maintains path integrity with a slight reduction in average transmission distance (from 7.13 km to 6.42 km). Furthermore, we analyze network performance across varying scales (10 to 250 nodes), showing improved link prediction accuracy and enhanced key generation rate in medium-sized networks. This work introduces a novel operation mode for QKD networks, shifting the paradigm of network optimization through adaptive and scalable quantum communication systems that enhance security and performance.




Transformer-Based Scalable Multi-Agent Reinforcement Learning for Networked Systems with Long-Range Interactions

Sinha, Vidur, Ustaomeroglu, Muhammed, Qu, Guannan

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning (MARL) has shown promise for large-scale network control, yet existing methods face two major limitations. First, they typically rely on assumptions leading to decay properties of local agent interactions, limiting their ability to capture long-range dependencies such as cascading power failures or epidemic outbreaks. Second, most approaches lack generalizability across network topologies, requiring retraining when applied to new graphs. We introduce STACCA (Shared Transformer Actor-Critic with Counterfactual Advantage), a unified transformer-based MARL framework that addresses both challenges. STACCA employs a centralized Graph Transformer Critic to model long-range dependencies and provide system-level feedback, while its shared Graph Transformer Actor learns a generalizable policy capable of adapting across diverse network structures. Further, to improve credit assignment during training, STACCA integrates a novel counterfactual advantage estimator that is compatible with state-value critic estimates. We evaluate STACCA on epidemic containment and rumor-spreading network control tasks, demonstrating improved performance, network generalization, and scalability. These results highlight the potential of transformer-based MARL architectures to achieve scalable and generalizable control in large-scale networked systems.